Slightly tweaking the traditional counterfactual, DiD asks: “what would happen to the trend for this unit had it never received the treatment?”
How would your rate of growth change if you ate more vegetables?
What would happen to inflation if the federal reserve lowered interest rates?
How would the Arab Spring have unfolded if participants lacked access to cell phones and social media?
Probably the oldest non-experimental method of causal inference (likely dates back to 1855)
Units must be observed before and after the “treatment”, so most commonly applied to panel data.
If assumptions are met, can control for both observed and unobserved confounding.
1854 Broad Street Cholera outbreak killed over 600 people in a poor district of London. What caused the outbreak?
What causes Cholera in general?
What interventions work?
The immediate and chief cause of diseases is atmospheric impurity arising from decomposing remnants of the substances used for food and from the impurities given out from their own bodies. (Neil Arnott, 1844)
Snow, however, found the initial outbreak was clustered around a single water pump on Broad Street. (73 of 83 initial deaths nearer to the Broad Street pump than any other)
Largely at Snow’s behest, the pump’s handle was removed, and the epidemic subsided, but does this tell us much? Outbreaks tend to subside!
Southwark and Vauxhall water company supplied 40,000+ homes from a reservoir that drew directly from the Thames
Supply had a well-established reputation for being…gross.
John Edwards “Sovereign of scented streams”
Lambeth waterworks, while it also drew from the Thames, moved their reservoir far upstream of the city in 1852.
| Water supply | Cholera deaths, 1849, rate per 100,000 | Cholera deaths, 1854, rate per 100,000 |
|---|---|---|
| Southwark & Vauxhall Company only | 1349 | 1466 |
| Lambeth Company Only | 847 | 193 |
Note that the companies have different starting points (Lambeth was already cleaner even by 1849), but miasma theory might lead you to expect the same trend.
If we can assume a parallel trend, then the relationship should look like this. The effect size, then, would be the difference between the counterfactual case and the observed case.
| Water supply | Cholera deaths, 1849, rate per 100,000 | Cholera deaths, 1854, rate per 100,000 | Difference in rates comparing 1854 to 1849, rate per 100,000 |
|---|---|---|---|
| Southwark & Vauxhall Company only | 1349 | 1466 | 118 |
| Lambeth Company Only | 847 | 193 | −653 |
| Difference-in-difference, Lambeth versus Southwark & Vauxhall | 502 | 1273 | −771 |
Answers the question “what would have happened to the treated units if they had not received the treatment” (average treatment effect on the treated or ATT)
i.e. “if Lambeth had not moved the reservoir upstream, there would have been a parallel increase in the number of cholera deaths among their customers”
But for [the treatment] the trends between treated and control units should be parallel
Does not require an assumption that observations are balanced on expected values of the outcome. Unobserved confounding only matters to the extent it impacts the trend.
Parallel trends: lines would be parallel but for the treatment
Exogeneity of treatment with respect to expected trends: treatment isn’t a response to baseline outcome or expected outcomes.
No spillover: untreated units aren’t impacted by treatment.
Stable groups: the before/after populations for each group are the same
For a simple 2-group x 2-time period DiD model, we can get this entire thing from a fairly simple OLS model:
\[ \hat{Y} = B_0 + B_1 \text{Time} + B_2\text{Treated} + B_3\text{Time x Treated} \]
\(B_0\) The average for the control group at \(T=0\)
\(B_1\) The average for the control group at \(T=1\)
\(B_2\) The difference between the treated and control units at \(T=0\)
\(B_3\) The difference in slopes for the treated group compared to the control group \(T=1\)
library(tidyverse)
df<-data.frame(
"period" =factor(rep(c(0, 1), 2), labels=c("before", "after")),
"group" = factor(rep(c(0, 1), each=2), labels=c("control", "treatment")),
"deaths" = c(1349, 1466, 847, 193)
)
model<-lm(deaths ~ period * group , data=df)
tidy(model)|>
select(term, estimate)| term | estimate |
|---|---|
| (Intercept) | 1.35e+03 |
| periodafter | 117 |
| grouptreatment | -502 |
| periodafter:grouptreatment | -771 |
In this setup, the interaction term represents our difference-in-difference estimate
| term | estimate |
|---|---|
| (Intercept) | 1.35e+03 |
| periodafter | 117 |
| grouptreatment | -502 |
| periodafter:grouptreatment | -771 |
| Water supply | Deaths 1849 | Deaths 1854 | 1854 - 1849 |
|---|---|---|---|
| S & V | 1349 | 1466 | 118 |
| Lambeth | 847 | 193 | −653 |
| DiD | 502 | 1273 | −771 |
The two-way FE estimate can be generalized to multiple groups/multiple periods by using a fixed effect for each group/time in place of the indicator for control vs. treatment cases:
\[ \hat{Y}_{gt} = \alpha_g + \gamma_t + \delta X_{gt} \] \[ \alpha_g = \text{Group Fixed Effect} \]
\[ \gamma_t = \text{Time Fixed Effect} \] \[ \delta_{gt} = \text{Post Treatment Indicator} \]
Results (reproduced by Angrist and Krueger)
Parallel trends may only hold conditionally on some observed characteristic
Controls should be based on characteristics that do not change or were measured prior to treatment.
In practice, this is as simple as adding an additional control to a regression.
In general, it makes sense to just stick with the linear probability model, despite its flaws
Simplest setup is the pre vs. post treatment cross-sectional case.
Biggest assumption is the parallel trend. Visual examination can help, especially if you have observations for prior outcomes
Including multiple years and cases with staggered treatments can make it easier to justify the parallel trends assumption. “Eventually treated” units are often a more sensible comparison case for units that have been recently treated.
However, there’s an issue with “early treated” and “late treated” units being weighed differently in the results.
| year | countyid | first.treat | treated |
|---|---|---|---|
| 2003 | 8001 | 2007 | 0 |
| 2004 | 8001 | 2007 | 0 |
| 2005 | 8001 | 2007 | 0 |
| 2006 | 8001 | 2007 | 0 |
| 2007 | 8001 | 2007 | 1 |
| 2003 | 8019 | 2007 | 0 |
| 2004 | 8019 | 2007 | 0 |
| 2005 | 8019 | 2007 | 0 |
| 2006 | 8019 | 2007 | 0 |
| 2007 | 8019 | 2007 | 1 |